XNOR-Net: Imagenet Classification Using Binary Convolutional Neural Networks

39

The XNOR-Net binarization approach seeks to identify the most accurate convolutional

approximations. Specifically, XNOR-Net employs a scaling factor, which plays a vital role

in the learning of BNNs, and improves the forward pass of BNNs as:

an

out αn(bwnban

in),

(3.3)

where αn = {αn

1, αn

2, ..., αn

Cn

out} ∈RCn

out

+

is known as the channel-wise scaling factor vector

to mitigate the output gap between Eq. (3.1) and its approximation of Eq. (3.3). We denote

A = {αn}N

n=1. Since the weight values are binary, XNOR-Net can implement the convolu-

tion with additions and subtractions. In the following, we state the XNOR operation for a

specific convolution layer, thus omitting the superscript n for simplicity. Most existing im-

plementations simply follow earlier studies [199, 159]to optimize A based on non-parametric

optimization as:

α, bw= arg min

α,bw

J(α, bw),

(3.4)

J(α, bw) =wαn bw2

2.

(3.5)

By expanding Eq. 3.5, we have:

J(α, bw) = α2(bw)Tbw 2αwTbw + wTw

(3.6)

where bw B. Thus, (bw)Tbw = Cin × K × K. wTw is also a constant due to w being a

known variable. Thus, Eq. 3.6 can be rewritten as:

J(α, bw) = α2 × Cin × K × K2αwTbw + constant.

(3.7)

The optimal solution can be achieved by maximizing the following constrained optimization:

bw= arg max

bw

wTbw, s.t.

bw B,

(3.8)

which can be solved by the sign function:

bwi =



+1

wi0

1

wi < 0

which is the optimal solution and is also widely used as a general solution to BNNs in the

following numerous works [159]. To find the optimal value for the scaling factor α, we take

the derivative of J(·) w.r.t. α and set it to zero as:

α=

wTbw

Cn

in × Kn × Kn .

(3.9)

By replacing bw with the sign function, we have that a closed-form solution of α can be

derived via the channel-wise absolute mean (CAM) as:

αi =

wi,:,:,:1

Cin × K × K

(3.10)

αi = wi,:,:,:1

M

. Therefore, the optimal estimation of a binary weight filter can be achieved

simply by taking the sign of weight values. The optimal scaling factor is the average of the

absolute weight values.

Based on the explicitly solved α, the training objective of the XNOR-Net-like BNNs

is given in a bilevel form:

W= arg min

W

L(W; A),

s.t. arg min

αn,bwn J(α, bw),

(3.11)

which is also known as hard binarization [159]. In the following, we show some variants of

such a binarization function.